Identifying communities by influence dynamics in social networks 
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Communities are not static; they evolve, split and merge, appear and disappear, i.e., they are 
product of dynamical processes that govern the evolution of the network. A good algorithm for 
community detection should not only quantify the topology of the network, but incorporate the 
dynamical processes that take place on the network. We present a novel algorithm for community 
detection that combines network structure with processes that support creation and/or evolution of 
communities. The algorithm does not embrace the universal approach but instead tries to focus on 
social networks and model dynamic social interactions that occur on those networks. It identifies 
leaders, and communities that form around those leaders. It naturally supports overlapping com- 
munities by associating each node with a membership vector that describes node's involvement in 
each community. This way, in addition to the overlapping communities, we can identify nodes that 
are good followers to their leader, and also nodes with no clear community involvement, that serve 
as a proxy between several communities and are equally as important. We run the algorithm for 
several real social networks which we believe represent a good fraction of the wide body of social 
networks and discuss the results including other possible applications. 

PACS numbers: 89.75.Hc, 02.50.Ga, 05.40.Fb 



I. INTRODUCTION 

Biological, technological and social complex systems 
are networked: their structure can be represented as net- 
works of interacting components. This makes networks 
a very powerful tool for understanding the structure, dy- 
namics and evolution of complex systems p]. Very often 
these networks exhibit modular and hierarchical struc- 
ture that supports their evolution into a highly complex 
systems [SHI] • The automatic detection of this modular 
structure - also known as community detection - can help 
identify closely related class of nodes and give a princi- 
pled way of understanding the organization of complex 
systems 0- 

However, current research for community detection fo- 
cuses on finding algorithms that can identify communi- 
ties in all contexts [3j 0J [21] . This universal approach has 
many drawbacks, the most important being that these 
algorithms fail to explain the produced partition. In or- 
der for algorithms to be usable in practical contexts, we 
need to incorporate context-based knowledge about how 
communities are built and how they evolve. For exam- 
ple, in social networks the communities are usually built 
around some important individuals or group of individu- 
als called leaders. In communication networks, modules 
are built around highly connected hubs and in paper ci- 



tation networks, communities correspond to the different 
research areas and important papers in those areas. 

Also, communities are not static, they evolve, split 
and merge, appear and disappear, i.e., they are prod- 
uct of dynamical processes that govern the evolution of 
the network Therefore, a good algorithm should 

not only quantify the topology of the network, but in- 
corporate the dynamical processes that take place on the 
network as well. Since there are many dynamical pro- 
cesses: spreading diseases, packet routing, viral market- 
ing, random walks, consensus dynamics etc, it is difficult 
to produce an algorithm that will perform well on ev- 
ery complex network. Communities in networks often 
overlap such that nodes simultaneously belong to sev- 
eral groups [5]- Surprisingly however, this property 
has been continuously disregarded until recently (9[ [12] , 
where few algorithms have been introduced, but their 
number is still substantially smaller than the number of 
non-overlapping algorithms. 

In this paper we present a novel algorithm for com- 
munity detection with focus on social networks. The al- 
gorithm does not embrace the universal approach but 
instead tries to focus on social networks and model the 
dynamic social interactions that occur on those networks. 
This helps to identify leaders and communities that form 
around those leaders. It naturally supports overlapping 
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communities by associating each node with a membership 
vector that describes node's involvement in each commu- 
nity. 

The outline of the paper is as follows. Section [XT] in- 
troduces the problem of community detection in social 
networks by using the famous Zachary social network as 
a case study to explain the problems with the current 
approaches and the motivation behind our algorithm. In 
section|TTl we present our algorithm and explain its steps. 
In section IV we run the algorithm for several real social 
networks which we believe represent a good fraction of 
the wide body of social networks. We also discuss other 
possible applications. Section |V| concludes this paper. 




II. MOTIVATION 

The most important part of a social network are its 
ties, or connections, that denote some kind of social rela- 
tionship. We believe that a simple quantification of these 
connections has many drawbacks. A good method for 
community detection must rather focus on the social re- 
lationship than on the bare connection, i.e., it must focus 
on processes that support these connections and the cre- 
ation of communities. In this section we use the Zachary 
social network [T3] as a case study to discuss some of 
the drawbacks of current methods. We also explain our 
motivation behind the proposed algorithm. 

The most popular methods for community detection 
are based on modularity quality function. Since its in- 
troduction 20J, there have been many community de- 
tection algorithms that use the modularity function as 
a basis [2"lTf2"3"] . These algorithms usually optimize this 
function in order to achieve a greater modularity value as 
a result, and consequently a better community detection. 
But recently the focus on the modularity function seems 
to be lost, mainly because of the shortcomings of the 
function discovered. Among others, the two most impor- 
tant are the resolution limit of the modularity function 
[24Tf27] and the structural diversity of high-modularity 
partitions |28j . Basically, the optimal partition may not 
coincide with the most intuitive partition. 

We found another shortcoming of the modularity func- 
tion on border case nodes (see Appendix |A|). Let's look at 
the Zachary social network with focus on node with id 10 
(Fig. [TJ. We will ignore the coloring of the nodes for now. 
Since this is a social network, we should consider the so- 
cial dynamics that are taking place, namely the influence 
spreading over the network. Node 10 has two neighbors, 
node 34, which is denoted by the author as a leader in the 
first community, and node 3, which is neighbor of node 
1, the leader in the second community. Clearly node 34 
has more influence in the network then node 3, so for 
example, if elections are being held in the karate club, 
node 10 will most probably vote for node 34, than for 
node 1. Consequently, node 10 should belong to the first 
community where node 34 is the leader. This emphasizes 
the idea that the assignment method should take into ac- 



FIG. 1. (Color online) The Zachary karate network [13] . 
Leaders with id 1 and 34 form communities and spread their 
influence through the network. The partition found by our 
algorithm not only matches the original partition, but also 
identifies the exact leaders. 



count the dynamics, not only the topology. On the other 
hand, modularity function produces greater value when 
node 10 is in the second community, and that decision 
is made only because the second community has smaller 
number of links (see Appendix |A|. All modularity-based 
algorithms will fail to produce the right partition of the 
network, since they are driven only by the network topol- 
ogy. There is also an implicit hierarchy in this network. 
There are 2 leaders and communities are build around 
those leaders. The removal of those leaders will result 
in splitting these communities since leaders are keeping 
these communities together. Identifying the leaders will 
implicitly result in identifying the communities. Further- 
more, to avoid the well-known resolution limit, decision- 
making process should be made on node level, and not 
globally on the whole network. Today we have very large, 
but sparse networks. That is why we believe that the de- 
cision in which community a node should belong, should 
be based on the node's neighborhood solely. We have 
networks that are growing rapidly fast, but the node's 
horizon is not growing beyond its neighborhood. 

When talking about influence, it is natural to talk 
about hierarchy as well. In one such hierarchy there are 
nodes that are more important and influential than some 
other nodes, hence located on a higher level in the hi- 
erarchy. It naturally follows that the leader is located 
on the highest level within that hierarchy (see Fig|2|. 
Since the hierarchies are consequence of the spreading of 
influence, and so are the communities, we believe that 
the identification of these hierarchies in a network will 
result in a natural community detection. The area in 
which a leader has most influence should define its com- 
munity. So, community detection is performed by finding 
all natural leaders and all nodes on which they influence. 
Partitions obtained this way can be naturally explained. 
Also, another intuitive property that a community should 
possess is satisfied this way which is the property that 
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shortest paths between nodes from a same community 
should consist only of nodes from that community. With 
well defined hierarchies, the shortest paths will be ap- 
proximately the paths that run through the hierarchical 
trees. 




FIG. 2. (Color online) Social hierarchy within a community. 
The more influential nodes are located on a higher level in the 
hierarchy. The leader is located on the highest level. Semi- 
circles depict different levels in the hierarchy with the darkest 
color denoting the highest level. 



III. THE ALGORITHM 

To sum up, the hierarchical point of view and its signif- 
icance, the natural community detection from it and the 
need of decentralized approach, are the basis of our algo- 
rithm. The first step is to define the amount of influence 
a node has on another node. Real networks have sig- 
nificantly high clustering coefficient, meaning the nodes 
tend to form triangles with other nodes. Here, we make 
use of the idea that the link density is greater within a 
community than between the communities. That means 
that more triangles are formed in the communities, than 
outside the communities. In |10j . interesting character- 
istics about a node's social embeddings are discovered. 
The in-degree can be explained by the person's genes in 
46 percent of the cases. But a more non-obvious charac- 
teristic is that 47 percent of the variation, whether a per- 
son's friends know one another, is attributable to the per- 
son's genes. Some people like to introduce their friends 
to each other and form communities around them, and 
others simply do not do that. And that is what separates 
the leader in a group from a regular person in that group. 
The leader tends to connect its neighbors with one an- 
other in order to build a stronger community around it, 
whereas a more margin person is more of a subject of 
being connected to someone, be a member of something, 
rather than connect someone, or create something, or 
influence someone. Therefore, if a node can find the "di- 
rection" where the most of its triangles are placed, then 
its community is also in that "direction" . The denser 
the triangles are, the closer the node is to the core of 



the community. We also believe that triangles between 
two neighbors serve as better proxy for influence than 
just the direct connection between the nodes. So it is 
natural that the more triangles a neighbor shares with a 
node, the more influence it has on that node. This shows 
the connection between the influence dynamics and the 
topology of the network measured with triangles. 

In the following, we focus on simple directed weighted 
network G with no multiple links and self-loops, de- 
scribed by its TV x N adjacency matrix A, where N is 
the number of nodes. By definition, A^ is the topolog- 
ical weight of the link going from j to i and Aij ^ Ay L 
in general. Also, since dealing with directed networks, 
we interpret A^ as proportional to the influence (trust) 
node i (node j) has on node j (node i). If the network 
is undirected Aij — Aji. Si = J2j Aij is the strength of 
node i and when the network is unweighted, is simply 
the in-degree of node i. 



A. Influence matrix 

To incorporate the information we have from trian- 
gles, we introduce network G , a weighted network where 
triangles are embedded into the link weights, thus ob- 
taining the influence matrix A' . To do this, let = 
min{Afcj, Ajk} be the "transitive" link weight from node 
i to node j through node k. We define this only for neigh- 
boring nodes, thus Cj^ = if Aji = 0. We choose the 
minimum of the two link weights motivated by the ex- 
pression "a chain is only as strong as its weakest link" . 
Together with the "direct" link weight Aji, we obtain 
the new link weight A' Jt = A yl + J2 k C % = A ji + A ji- 
This procedure is illustrated in Fig. [3] If the network 
is undirected and unweighted, A'j t is simply the num- 
ber of triangles between neighbors i and j plus 1. Also, 
for later use, we will define now Aj — Aj k which 
for undirected and unweighted network is simply twice 
the number of triangles containing node j . However, 
if link weights represent the actual influences we are try- 
ing to extract (including the transitive weights) , this step 
should be omitted, i.e., one can take A^=Ajj. 

The next step is calculating the overall influence x* 
of every node i in the network. The overall influence x* 
represents how important is the opinion of node i in the 
network, i.e., how much its opinion spreads through the 
network. As a process for modeling the influence spread- 
ing in the network, we consider the unbiased random walk 
where, at each step, a walker at node j follows one of the 
outgoing links proportionally to the link's weight A'^. 
Writing x(t) = [xi(t) x 2 (t) . . . Xjv(*)], where Xi(t) is the 
overall influence of the node i at time t, the expected 
density of walkers evolves according to the rate equation 

x(t + l)=Tx(*) (1) 

where T is the transition matrix whose entry repre- 
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FIG. 3. (Color online) Embedding triangle information into 
the link weights. Solid lines depict "direct" links and dashed 
lines depict "transitive" links. The "transitive" link weights 
are obtained as the minimum weight of the links of which are 
deducted. Line labels denote link weights with width propor- 
tional to their weight. 

sents the probability to jump from j to i, 

r« = J^cr- (2) 

Tij denotes the relative influence node i has on node j. 
We start with initial vector x(0) = . . . i]. The 

overall influences {x*} is a steady-state solution of ([!]) 
and can be obtained for directed networks only numeri- 
cally by iterating ([I]), that is, when time t goes to infinity. 
In a special case when the network is undirected and non- 
bipartite, there is a known analytical solution for x*, i.e., 

Note that node's potential of becoming a leader depends 
on the in-degree and number of triangles, as discussed 
earlier in this section. 



B. Leaders identification 

Since we now know the relative influences between the 
nodes T t j , and the overall influences of nodes x* , we can 
find the leaders in the network. A leader should have 
big overall influence, since the overall influence repre- 
sents how close a node is to the core of its community, 
and the actual potential of becoming a leader. Also, a 
leader should have more influence on its neighbors than 
they have on it. Therefore we define leaders as those 
nodes for which the product (overall influence) x (rela- 
tive influence) is large. More precisely, we denote with 
T{ = {j\Tji = maxfcTfei} the set of neighbors with the 
largest relative influence on node i. Node i is a leader if: 

Tij ' X^ > Tji ' Xj (3) 

for all j € I\ • The product Ty ■ x* of two numbers Tij 
and x* combines the relative influence of node i towards 
node j with the overall influence of node i. 



Note that in the rare cases where two or more leaders 
are also most influential neighbors between each other, 
(that is, when ■ x* — Tji ■ x* ), than they are group- 
ing together and are becoming leaders of one group. For 
example, in a full mesh network, all of the nodes are lead- 
ers of one community, whereas for a ring network, each 
node is a leader to its own community. Actually, this 
suggests that in the cases where there is a lack of hierar- 
chical structure, no particular leader in a community, the 
community will be split on subgroups and the partition 
will depend on its link density. 

C. Computing the membership vectors 

Suppose we have L leaders in the network, hence L 
communities and let I = {Zi, 1%, . . . , be the set of 
all the leaders. We calculate the membership vector 
Yi = [ul Ui ■ ■ ■ Ui] T , a probability vector of length L, 
that describes node i's involvement in each community. 
Since is a probability vector, its components sum to 
1, i.e. Yjk=iVi = 1- For every leader Z i; the initial 
membership vector y; ; (0) has all the components equal 
to zero, except for the i-th component y\, = 1. For every 
node j that is not a leader, all the components of yj(0) 
are initialized to jr to denote equal participation to each 
community. For computing the membership vectors, we 
consider consensus dynamics, i.e. 

y<(* + 1) = y^jr. E 4;*yi(*) = E 

At each time step, the membership vector of each node 
is updated by computing a weighted average of the mem- 
bership vectors of its neighbors. We do not use matrix 
A' since the influence embedded in A' will naturally oc- 
cur in this process and its inclusion can introduce bias. 
However, if S is irreducible, which is often true for undi- 
rected graphs, this system will converge to a consensus 
state, where all the nodes reach an agreement, thus hav- 
ing the same membership vector. To avoid this, we keep 
the leader's membership vector immutable, i.e., 

yi 4 (t + i) = yj 4 (*) = ..- = yj 1 (o). 

This way, we modify matrix A, by connecting each leader 
only by itself. After this modification, matrix S remains 
a Markov matrix with every column summing to 1, which 
guarantees convergence since the largest eigenvalue of S 
is 1 and its multiplicity is L. 

IV. APPLICATIONS TO REAL NETWORKS 

A. Properties of the algorithm 

In this subsection we discuss four properties of the al- 
gorithm: detecting overlapping nodes, detecting leaders, 
detecting hierarchical organization, and decentralization. 
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1. Detecting overlapping nodes 

An important property of our algorithm is the compu- 
tation of a membership vector for each node. Instead of 
having one number denoting its membership in a single 
community, we have a percentage for each community. 
As a result, we can easily identify nodes that naturally 
belong to more than one community, known as overlap- 
ping nodes. Additionally, we can find nodes that are good 
followers of their leader, but also nodes that have no dis- 
tinguished leader and serve as a proxy between several 
communities. 



2. Detecting leaders 

As our algorithm is best suited for real networks where 
the process of influence spreading takes place, it is only 
natural that it can be used for influence related prob- 
lems. One such, is the actual identification of the leader 
in a community. By detecting the leader in a community 
we gain very useful information, as the leader, by the 
definition of the algorithm, is the most influential node 
in its community. By removing the leader it can be ex- 
pected for the community to suffer serious consequences, 
like splitting up on several smaller communities or a com- 
plete degradation. The leader's hierarchy, or the leader's 
community, is the area where the leader's opinion is the 
most influential opinion. This can be used for an efficient 
viral marketing campaign, for example. One interesting 
feature of the algorithm is that although it automatically 
detects the best leaders, one can specify a priori some 
nodes as leaders and build community structures around 
them. 



3. Detecting hierarchical organization 

Another characteristic feature of the algorithm is the 
possibility of deriving the hierarchical organizations of 
the communities. A node's parent can be easily detected 
by the influence matrix and the overall influences. It 
can be the most influential neighbor of its community, 
or it can be the most strictly oriented neighbor towards 
the same community, actually on a higher hierarchical 
level. This can be used in communication networks, 
where a node can use a hyperbolic greedy algorithm to 
forward packets to other nodes in the community |30j . 
which is important since the communication is more fre- 
quent within a community. As for the other nodes, the 
greedy algorithm can be used to forward the packets to 
the leader, supposing that the leader knows how to for- 
ward those packets to its respective leader. Also, a node's 
siblings can be detected, as they all share the same par- 
ent. This may be used in prediction of missing links 
scenario, for example. 



4- Decentralization 



The idea behind decentralization is that a node should 
be able to decide in which community it belongs only 
by considering its neighborhood, without taking into ac- 
count any global characteristics of the system. An impor- 
tant property of our algorithm is that it can be applied 
on decentralized scenarios. In the first step of the algo- 
rithm, we only need the connectivity in the neighborhood 
of each node to determine the matrix A! . In the second 
step, the influences are computed with random walk iter- 
ations which can run in distributed fashion using message 
passing. Leaders identification involves a direct message 
exchange between everu node i and its potential neigh- 
boring leaders (Fj). In the last step, the leaders spread 
their influences in the network through their neighbors 
with message passing and within several iterations, the 
system stabilizes to the desired state. All of the described 
steps can be carried out in a decentralized fashion with 
message passing. Also, our algorithm can incorporate 
network dynamics as well. If a new node is added to the 
network, it finds its parent and calculates its membership 
vector. If a node is removed from the network, or a link 
is added/removed from the network, the affected nodes 
can detect their parents and recalculate their member- 
ship vectors. 

Even though designed with social networks in mind, 
we believe our algorithm can be used in various contexts. 
Very often, in wireless sensor networks with low energy 
requirements and limited sensor memory, we need to ag- 
gregate the sensor data of the nodes. Since the nodes are 
being deployed in an Eucledian space, one should expect 
non-negligible number of triangles. Also, we can expect 
the detected communities to depend on the geographic 
distribution of the nodes (smaller communities to be de- 
tected with approximately equal sizes if the geographical 
distribution of the nodes is uniform). Furthermore, the 
sensor data aggregation is geographically-based. As a 
result, the aggregation on a community level will be a 
good aggregation. The hierarchical organization within 
a community can be very useful for the aggregation pro- 
cess. Clearly, the leader is best suited to be an aggrega- 
tor, so the nodes should transfer their sensor data to the 
leader. Even more, one can assign arbitrary nodes as ag- 
gregators, such as nodes that have more resources. Since 
the sensor nodes have very limited resources (processing, 
memory, energy etc.), a simple memory-free hyperbolic 
greedy algorithm, based on the derived hierarchy, can be 
of great significance [30] ■ 

If executed in centralized fashion, our algorithm has 
low computational complexity varying from O(N) to 
0(N 2 ), depending on the power-law exponent of the de- 
gree distribution and the number of detected communi- 
ties (leaders) (see Appendix |B|) . 
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B. Real-world networks 

In order to verify the validity of our algorithm, we run 
the algorithm for several real social networks which we 
believe represent a good fraction of the wide body of so- 
cial networks. The networks are small, easy to visualize 
and have been explored by many researchers studying 
social behavior. Thus, we can visually and verbally mea- 
sure the performance of our algorithm. Also, our algo- 
rithm is fast enough to work with large networks having 
millions of links, but we did not find a social network 
rich enough with meta-data to objectively measure our 
algorithm. Furthermore, we avoid the LFR benchmark 
since its connection to real social networks is questionable 
and if we ignore that, still we will be unable to validate 
if our algorithm found the real leaders. When visualiz- 
ing the results, each leader is assigned different color and 
each node is assigned a color which is a weighted average 
of the colors of the leaders in the network based on the 
membership vector. When visualizing, the layout is done 
by the Fruchterman-Reingold algorithm |32j . with node 
sizes proportional to their overall influences. When we 
compare partitions, we marginalize our result by assign- 
ing each node to the community with the highest compo- 
nent in the corresponding membership vector. This can 
also be seen visually as each node has a dominant color. 

1. The Zachary karate network 

One of the most popular networks for validating com- 
munity structures is the Zachary karate network [13] . It 
is a friendship network consisted of social interactions 
among members of a karate club, so it is driven by spread- 
ing of influence. The author denoted two nodes as lead- 
ers in the network, the president of the club and the 
instructor (node 34 and 1 respectively), and two respec- 
tive communities. The communities have been created 
after a drift between the two leaders (the president and 
the instructor of the club). As it is shown on Fig. [TJ the 
partition found by our algorithm not only matches the 
original partition, but also identifies the exact leaders. 

2. Bottle-nose dolphins network 

Another popular real- world network in the community 
detection field is the bottle-nose dolphins of Doubtful 
Sound network [TJ1 [T5] . The network consists of 62 dol- 
phins observed in a seven years period, with links corre- 
sponding to significant frequent association. The network 
was split into two communities |16) for the period when a 
dolphin (node SN100) located between the groups tem- 
porarily disappeared. Further, there was also detected 
clear statistically significant assortative mixing by sex 
among the dolphin population. So, for this network we 
only have a predefined strong division, with further divi- 
sions probably dependent on the gender. Our algorithm 



detects four communities (see Fig. |4j) where if Topless's, 
Grin's and TR77's communities are combined, we have 
the original strong division into two groups. The pro- 
duced partition is very similar to the ones produced by 
PU and [2U] ■ Most of the Topless-, TR77- and Gallatin- 
oriented dolphins are males (black labels), and almost 
all of the Grin-oriented dolphins are females (white la- 
bels). There are four nodes with unknown gender (gray 
labels). Access to oestrus females (females in rutting sea- 
son) tends to be the main driver of male sociality |17) . 
The evolution of the complex relationships between male 
groups was driven by sexual competition probably to out- 
compete other males for female choice. Topless's and 
TR77's communities seem to be driven by those rules as 
well, as the author noted that most of the males from 
those groups spent significantly more time with oestrus 
females than with the other female groups. Indeed it can 
be seen that the male dolphins from Topless's commu- 
nity have significant association with the female dolphin 
Trigger and the females from Grin's community. The 
core dolphins from Gallatin's community did not spent 
significantly more time with the oestrus females. In a 
way, this confirms our partition as a good one. Some of 
the detected leaders are identified as central individuals 
by the author [15]. 




FIG. 4. (Color online) The Bottle-nose dolphins network. 
Four leaders are detected: Topless, Grin, TR77 and Gallatin. 
Gallatin's community and the combination of the other three 
communities gives the main strong division, noted by the au- 
thors. Almost all of the Grin-oriented dolphins are female, 
and most of the other dolphins are male. The female dol- 
phins are labeled with white, the male with black and those 
with unknown gender are labeled with gray color. There is 
clear statistically significant assortative mixing by sex among 
the dolphin population [16] , and also access to oestrus females 
tends to be the main driver of male sociality [17j . which in a 
way explains our partition. 



3. Sawmill network 

Fig. [5] shows the sawmill communication network, 
which is a communication network between the employ- 
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ees within a sawmill [TB]. The network consists of em- 
ployees speaking English and Spanish language. Also, 
there are four sectors, the planer crew, the mill crew, 
the mill management and the yard. There are two non- 
sector members - the kiln operator and the forester. The 
large sectors - the planer crew and the mill crew - are 
further divided into two subgroups corresponding to the 
native language. Our algorithm detects four communi- 
ties with four leaders: nodes 12, 36, 31 and 27. Two of 
the communities correspond to the English planer and 
mill crew, node 36's and node 27's, respectively. The 
Hispanic planer and mill crew (Spanish native) are joint 
together in node 12's community. This comes as a re- 
sult of the lack of hierarchical and community structure 
in the planer crew (up-left), meaning none of the nodes 
act as a leader. As a consequence, the nodes of this 
group are mostly oriented towards the employee Juan 
(node 12), which is due to the big overall influence that 
the employee Juan has. A significant information flow is 
conducted through that employee as noted by the author 
as well. Also, the nodes from the Hispanic planer crew 
are strongly influenced by node 36, which is a leader of 
the English planer crew, so their colors are mixture of 
node 12's and node 36's colors. The final community is 
the mill management, merged with the small Yard sector 
(only two employees), the kiln operator and the forester 
- node 31 (mill manager) 's community. 




FIG. 5. (Color online) The Sawmill communication network. 
Our algorithm successfully identifies the sectors within the 
sawmill and the divisions corresponding to the native lan- 
guages, with only difference being the merging of the two 
Spanish sectors. That comes as a result of the lack of hier- 
archical and community structure in one of the sectors and 
the big overall influence of the employee Juan (node 12), also 
noted by the author |18| . 



4- Sawmill strike network 

Fig. [6] shows the communication network between the 
employees within a sawmill during a period of a strike 
[19] , The strike occurred as a result of the new rules, in- 
stalled by the new management, that changed the work- 
ers' compensation package. Company management (not 
shown in the figure) perceived that the two union ne- 
gotiators were not fully communicating their terms with 
all of the union members. They felt that the new wage 
package was not being properly explained to all employ- 
ees by the union negotiators. The research reveals the 
network structure. There exist two groups according to 
age division (see Fig. |6| - a group of older employees 
(over 38 years old - right side) and a group of younger 
employees (under 30 years old - left side). Further, in the 
group of younger employees there is a division due to the 
native language - English (bottom) and Spanish (top). 
The author denoted the nodes with id 9 and with id 14 
as the most central nodes in the young and old group, 
respectively. The same are identified as leaders by our 
algorithm as well. The node with id 10 is also identified 
as a leader, and is noted by the author as the most pro- 
ficient English speaker from the densely-connected His- 
panic group, and the only one that communicates out- 
side that group. The research helped in the resolving of 
the negotiations stalemate between the new management 
and the negotiators (nodes with id 22 and 24). Since the 
main problem was perceived to be the lack of communi- 
cation between the negotiators and the rest of the em- 
ployees, particularly the young ones, a cooperation with 
the nodes 9 and 14 was proposed, so the communication 
would be improved. That was the actual CcLS6, clS the 
more than 3 weeks old strike was ended within 48 hours, 
and the production was restarted shortly thereafter. 



V. CONCLUSION 



After the discovering of the main drawbacks of mod- 
ularity function [2~4T - |2"8"] . its focus among researchers has 
been slightly decreased, and we are expecting a new wave 
of different approaches and algorithms in the next years. 
Our algorithm is following this flow by not embracing 
the universal approach, but rather focusing on social net- 
works with the dynamic social interactions that occur on 
those networks. The membership vectors found by our 
algorithm are much more descriptive than a partition; we 
obtain partitions by marginalizing the membership vec- 
tors. Besides community detection, identifying leaders 
can be very important when modeling dynamics between 
a group of opposing members in a network, such as elec- 
tions and marketing campaigns. 




FIG. 6. (Color online) The Sawmill strike network. Com- 
munication network of the employees within a sawmill during 
a period of a strike [T5]. The network has three communi- 
ties, also correctly identified by our algorithm: young English 
group (bottom-left), young Spanish group (top- left) and old 
English group (right). The leaders detected by the algorithm 
are also noted by the author as most central nodes in their 
groups. 



Appendix A: Misjudgment of the modularity 
function 



Suppose we have a graph G with n nodes, with two 
communities C\ and C 2 , and with total of m links be- 
tween the nodes. We observe a single node x in the net- 
work. It has d\ links to nodes from the Ci-community 
and c?2 links to nodes from the C2 -community. We want 
to know how the modularity function makes the decision 
on whether it places the node x in the Ci-community or 
in the ^-community. That is actually how the network 
topology, i.e., the communities' sizes and number of links 
and nodes, influence the value of the modularity function 
for a given partition. Let Q 1 be the modularity value if 
the node x is placed in the Ci-community and Q 2 be 
the value if it is placed in the ^-community. Let Q\ be 
the contribution that the node x gives to the partition 
with the joining of the Ci-community, and Q 2 the con- 
tribution of joining the ^-community. The modularity 
function is given by 



Q 



2m ^ 

*3 



A, 



k j 



2m 



S(ci,Cj) 



where Cj is the community to which node i belongs, 
5(ci,Cj) is the Kronecker delta symbol, ki is the degree 



of node i. We take Ki = J2jed j^x ^ c 



Q * ~ o™ E 



2m 

1 

2m 

1 

2m 



x ° 2m 



di 



2m ^ 3 



d!+d 2 

d i o Kl 

2m 



In a similar way, 



Ql 



2m 



, di + d 2 

d 2 s K 2 

2 m 



A, 



kj^ kj 



2m 



8{a,Cj 



Since 

= Q 2 - 2Q 2 X 

we have 

Q 1 <Q 2 ^Ql< Ql 



We want to know when the modularity function will 
choose C 2 over Ci, so we will explore the value of 

Ql < Ql 



Qx Qx 



2m 



d 1 -d 2 - d ^(K 1 -K 2 ) 
Am 



If we assume d\ = td% , where t > 1 is an integer, we have 
1 



~lx Qx ~ 



2m 



C*-l)4-^Vi-*a) 



In order for this expression to be smaller than 0, we must 
have 

(t-l)d 2 < { ^±^( Kl -K 2 ), 
2m 

that is 

(t - l)2m < (t + l)Ki - (t + 1)K 2 . 
Since K\ + K 2 + di + d 2 = 2m, we obtain 

(t - l)(Ki + K 2 + (t + l)d 2 ) <(t + 1)^1 - (t + l)K 2 , 
that is 

2tK 2 + (t 2 - l)d 2 < 2Kl 

For example, let us consider the simplest case: the node 
x has one link to a node from the Ci-community and one 
link to a node from the ^-community. So, t = 1 and 
d 2 = 1. We have that K 2 < Ki is the only condition 
for the modularity function to choose the ^-community. 
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This is the exact case for the Zachary karate club net- 
work and the node with id 10. The modularity function 
produces greater value when the node 10 is placed in 
the community of the node 1, only because that com- 
munity is smaller (links-wise) and it does not give any 
significance to the fact that one of the neighbors of the 
node 10 is the leader of the other community, the node 
34. As another example, let the node x have t links 
to nodes from the C\ -community and 1 link to a node 
from the ^-community. So, dq, = 1, and we have that 
2tK 2 + t 2 — 1 < 2Kx is the condition for the modular- 
ity to choose the ^-community. That means that if the 
^-community is approximately t times larger than the 
Ci-community, the modularity function will produce big- 
ger value for the case where the node x is placed in the 
Ci-community, despite the fact that it has only 1 link 
with nodes from that community, compared to the t links 
with nodes from the C% community. One can say that the 
modularity tends to make the communities equal. 



Appendix B: Computational complexity of the 
algorithm 

The algorithm consists of 4 steps. We now analyze 
the running times of each step in order to determine the 
overall algorithm's complexity. 



Influence matrix 

Building the weighted adjacency matrix A is done 
by computing the number of mutual triangles or 
common neighbors between every pair of neighbor- 
ing nodes in the network. Without loss of general- 
ity, we give a pseudo code for the undirected network 
case 

for each node i do 

for each neighbor j of i do 

intersectNeighbors (i ,j ) 

{calculate A^A 

end for 

end for 

where inter s ect N ei g hbor s (i, j) finds the intersection be- 
tween the set of neighbors of node i and j in time linear 
with the size of the sets, since we keep the sets sorted. 
Consequently, the running time is k?, where fcj is 
the degree of node i. Note that this is not the same as 
N ■ (k) , where (k) is the average node degree, since in 
real networks, the degree distribution is usually a power 
law distribution, P(k) ~ k~ a , with the scaling factor 
2 < a < 3. We have 



N 

^2k? = N / k 2 P(k)dk. 



N 



2-a 



dk 



connected, meaning there exists a path between every 
pair of nodes in the network. That means the lowest 
degree in the network is 1, and that is the lower bound of 
the integral k m i n = 1. In |31j an approximation k max rs 

iV°^i is derived. We take P(k) — Ck~ a , where C is a 
constant. Therefore, we have 

N fk ma:c 

Y^kf = N J k 2 P(k)dk 

J k 2 P(k)dk = NCj k 

NC - If"- 1 ~N-N^ 

3 — a 

N 2 if a=2 

N if a=3 

Thus, the running time of computing the influence matrix 
varies from O(N) to 0(N 2 ) depending on the scaling 
factor a. This can be confirmed by Fig. [7J where we 
generated graphs with power-law degree distribution for 
a = 2.01 and a = 2.99. Each point is an average of 
100 runs. As expected, the running time for a = 2.01 is 
quadratic and for a — 2.99 is linear. 



<*=2.01 
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Since, this integral diverges, we have to approximate its 
lower and upper bound. We consider the network to be 



FIG. 7. (Color online) Execution time simulations for calcu- 
lating the influence matrix. The inset shows that the running 
time grows linearly with the number of nodes when a = 2.99. 
On the other hand, the running time is 0(N 2 ) when a = 2.01. 



Nodes' overall influences 

This process is actually a random walk process. If 
we have an undirected network, we even know the exact 
influences. So, the complexity is 0(c ■ m) ps 0(m) rj 
O(N). c is the number of iterations until convergence, 
and its usually less than 50 and m oc N in sparse graphs. 
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Leaders identification 

Here each node is in a battle with each of its potential 
parents, so clearly we have O(N) complexity. 



Computing the membership vectors 

This operation is very similar to the consensus linear 
process, with the difference of having a vector, instead of 
a single number, associated with each node. The com- 
plexity is 0(N x L), where L is the number of leaders. 
In Fig. [8] we show the execution time of this step on sim- 
ulated LFR networks with power-law degree distribution 
for a — 2, 2.5 and 3 |11) . The parameters we use are sim- 
ilar to the ones in [TT]. Each point is an average of 100 
runs. The average node degree is 20 and maximum de- 
gree is 50. The exponent of the power-law distribution of 
community size is 1, minimum community size is 20 and 
maximum community size is 100. The mixing parameter 
of every network is 0.3. Since we restrict the community 
size, the number of generated communities grows linearly 
with the number of nodes, i.e. L ~ N, thus, rendering 
the running time to quadratic. This comes only as a con- 
sequence of the application of the LFR benchmark and 
its parameters, and does not reflect any characteristics 
of our algorithm. In general the number of communities 
does not necessarily grow with the size of the network. 



8000 




Number of nodes 



FIG. 8. (Color online) Running times of the algorithm on 
the LFR benchmark [11]. The inset shows that the number 
of communities grows linearly with the number of nodes, be- 
cause we restrict the community size. As a consequence, the 
complexity of the algorithm is 0(N 2 ). 

To conclude this section, the running times of the first 
and the last step are of the highest order, with execution 
times varying from O(N) to 0(N 2 ), depending on the 
power-law exponent and the number of detected commu- 
nities, respectively. Thus, the overall complexity varies 
from O(N) to 0(N 2 ) as well. 



[1] M. E. J. Newman. Networks - An Introduction. Oxford: 
Oxford University Press, 2010. 

[2] S. Fortunato. Community detection in graphs. Phys. Rep. 
486, 75-174 (2010). 

[3] G. Palla, I. Dereny, I. Farkas and T. Vicsek. Uncover- 
ing the overlapping community structure of complex net- 
works in nature and society. Nature 435, 814-818 (2005). 

[4] M. Rosvall, C. T. Bergstrom. Maps of random walks 
on complex networks reveal community structure. Proc. 
Natl Acad. Sci. USA 105, 1118-1123 (2008). 

[5] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai and 
A. L. Barabasi. Hierarchical organization of modularity 
in metabolic networks. Science 297, 1551-1555 (2002). 

[6] M. Sales-Pardo, R. Guimera, A. Moreira and L. Ama- 
ral. Extracting the hierarchical organization of complex 
systems. Proc. Natl Acad. Sci. USA 104, 15224-15229 
(2007). 

[7] A. Clauset, C. Moore and M. E. J. Newman. Hierarchical 
structure and the prediction of missing links in networks. 
Nature 453, 98-101 (2008). 

[8] G. Palla, A. Barabasi and T. Vicsek. Quantifying social 
group evolution. Nature 446, 664-667 (2007). 

[9] Y.-Y. Ahn, J. P. Bagrow and S. Lehmann. Link commu- 
nities reveal multiscale complexity in networks. Nature 
466, 761-764 (2010). 
[10] J. H. Fowler, C. T. Dawes and N. A. Christakis. Model 



of genetic variation in human social networks. Proc. Natl 
Acad. Sci. USA 106, 1720-1724 (2009) 

[11] A. Lancichinetti, S. Fortunato, and F. Radicchi. Bench- 
mark graphs for testing community detection algorithms. 
Phys. Rev. E 78(4), 046110 (2008) 

[12] LA. Kovcs, R. Palotai, M.S. Szalay and P. Csermely. 
Community Landscapes: An Integrative Approach to De- 
termine Overlapping Network Module Hierarchy, Identify 
Key Nodes and Predict Network Dynamics. PLoS ONE 
5(9): el2528. doi:10.1371/journal.pone.0012528 (2010) 

[13] W. W. Zachary. An information flow model for conflict 
and fission in small groups, Journal of Anthropological 
Research 33, 452-473 (1977). 

[14] D. Lusseau, The emergent properties of a dolphin so- 
cial network. Biology Letters, Proc. R. Soc. London B 
(suppl.) (2003). DOI 10.1098/rsbl.2003.0057. 

[15] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. 
Slooten, and S. M. Dawson, The bottlenose dolphin com- 
munity of Doubtful Sound features a large proportion 
of long-lasting associations. Can geographic isolation ex- 
plain this unique trait?, Behavioral Ecology and Sociobi- 
ology (2003). DOI 10.1007/s00265-003-0651-y. 

[16] D. Lusseau and M. E. J. Newman. Identifying the role 
that animals play in their social networks. Proc. R. Soc. 
London B (Suppl.) 271, S477-S481 (2004) 

[17] D. Lusseau, Why Are Male Social Relationships Complex 



11 



in the Doubtful Sound Bottlenose Dolphin Population?, 
PLoS ONE. 2(4): e348 (2007) 
[18] J. H. Michael, J. G. Massey. Modeling the communication 
network in a sawmill. Forest Products Journal 47 25-30 
(1997). 

[19] J. H. Michael. Labor dispute reconciliation in a forest 
products manufacturing facility. Forest Products Journal 
47 41-45 (1997). 

[20] M. E. J. Newman and M. Girvan. Physical Review E 69, 
026113 (2004). 

[21] A. Clauset, M. E. J. Newman and C. Moore. Finding 

community structure in very large networks. Phys. Rev. 

E 70, 066111 (2004). 
[22] J. Duch and A. Arenas. Phys. Rev. E 72, 027104 (2005). 
[23] M. E. J. Newman. Modularity and community structure 

in networks. Proc. Natl Acad. Sci. USA 103, 8577-8582 

(2006). 

[24] S. Fortunato and M. Barthelemy, Proc. Natl. Acad. Sci. 

USA 104, 36 (2007). 
[25] J. M. Kumpula, J. Saramaki, K. Kaski, and J. Kertesz, 

Eur. Phys. J. B 56, 41 (2007). 



[26] L. K. Branting, in Proc. 2nd Workshop on Social Network 
Mining and Analysis, at 14th ACM SIGKDD Interna- 
tional Conf. on Knowledge Discovery and Data Mining 
(2008). 

[27] J. W. Berry, B. He ndrickson, R. A . LaViolette, and C. 
A. Phillips, e-print, [arXiv:0903.1072| (2009). 

[28] B. H. Good, Y. A. Montjoye and A. Clauset, Performance 
of modularity maximization in practical contexts, Phys. 
Rev. E 81, 046106 (2010). 

[29] J. Leskovec, Dynamics of large networks, Technical re- 
port CMU-ML-08-111, 2008. 

[30] R. Kleinberg, Geographic routing using hyperbolic space, 
In Proceedings of the 26th Annual Joint Conference of 
the IEEE Computer and Communications Societies (IN- 
FOCOM), 1902-1909 (2007). 

[31] M. E. J. Newman. The structure and function of complex 
networks, SIAM Review 45 167-256 (2003) 

[32] T. M. J. Fruchterman and E. M. Reingold. Graph Draw- 
ing by Force-Directed Placement. Software - Practice & 
Experience (Wiley) 21 (11): 1129-1164 (1991) 



